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ABSTRACT ^ \ 

The measurement ctf change is such a broad * topic' ^hat 
this article^must limit its focus to ja. few. specific subtopics* T|iese 
.Specific topics include: longitudinal research design, altr^i(^ion in 
research studies, the s1;atisticai^analysis of difference sdbres, and 
the comparison of analysis of \^ariance (ANOVA) and multivaria,te 
andlysis of variance (HANG 7S) techniques in analyzing repealed 
measures data. The* purpose and sampling. tecLpriques, as welf- -as* the 
internal and external validity 'are discusfeed. for* each meas^ur-ement 
technique.. The author concludes that-a considerable numbei^f.qf- 
problems are inherent in' the measurement andv analyses' o^ <8hange, 
especially in research designs of a Idngitudrinal nature, ^"kbwever. 




j^:oS;5-^cf'ti<|n"^l"'se client ial type designs which^are required for*1ralid 
measures of I'developmental change are very costly-'-but necess^ty\if . 
th^researph is- tp. have any scientific Value. Multivariate ^ 
statistical/ procedures utilizing complete data .sets will provide foii' 
vaiid anS- relatively powerful tests of hypotheses. (Authojr/MV) 
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The general topic to, which this papet is ^directed, the meftisurement of ♦ ^ 
change, ia so broad, so all^encompassing, that any one* article or presentation 
must limit its foci;ts on a few specific subtojrics. A psychometric ian or researcher 
int-eresied in statistics and methodology dealing with, change would need to. use V 
an extensive list ^of key words in his literature search to keep abreast in the 
field. Terms such as developmental, longitudinal, growth, trend, repeated 
measures, ^change, curve fitting, and stochastic are just some of the descriptors 
- each one of which ^constitutes a speciali?ed .and relatively extensive bo^y of 
^ knowledge; ^e specific topics dealt with in this paper are longitudinal re- 
search desigii, the statistical analysis of difference scores, and the comparison 

between ANOVA* and MANOVA techniques in analyzing 'repeated measures 'data. 

» * * * 



LONGITODfNAL ^.ESEARCH DESIGN * ' 



History 



' , Cross-sectional studies, of ten with inadequate design and control, or 
longitudinal, studies on a. few Subjects, were the most comnon basis for drawing 
InferentfeS regarding developmental change in the early 1900* s. Large scale 
longitudinal studies,, likS those of Terman, et al. (1925), were few and .far 
betweenv However,, by the 1950*s and 60's, there seemed to be a considerable 
increase in the extent^ to which researchers were willing to embark on such ex- ^ 
tensive research projects, partially because of a better ynderstanding of. the 
pitfalls in' crpss-sectional research- and also' because vof an .increase in the * 
availability of funding for this type of 'developmentjal, j;esearch. More recently, 
methodoldgical papers bjl^-Balt^s .(496'8), *tabquvie^ Bartsch, Nesselroade, and 
Bates (197A), and Schaie (1965) .have revealed numerous shortcomings In the basic 
research- design commonly employed in longitudinal studies. In light of these 
4)apers, both the external and internal validity of many longitudinal sttjdies 
must now* be questioned, ♦ • ^ - - ' ' . ' . - 

^ 7 - * ' * ^ . *' 

' ' ■ " ^ ^ ' \ • ' ■ * \ ^' • 

Baltes (1968; p. 149), in discussing the traditional cro^s-sectionaj. and ^ 

longitudinal deaths,' states that,- "Iji the light of ^present standards of research 
methodology, both researc^i designs appear to be relatively naiye^" and then. later " 
(1965,!xp» 153) claims that, both C4>nventipnal* designs have such a totai ab- 

sence dfScontroi- as to be of almost no scientific value," For those investigators* • 



about to "initiate sucl) ^ research venture, tj^ere are 'procedures Available which 
^ help citcumvent these problems, but the ,res^rcher into, the' second* or. third year^ 

, ' of a 20-y^ar longitudinal' fetudy is now faced with a difficult decision. » - He . • • 
y _4 must either abandon the study, and start again, or attempt to incorporate, con- 
^ trols into tl\e study in an attempt to establOtsh as much external and internal 
, validity as possiBle.^ Obviously, future longitudinal'^ studies must be desig^ied 
: with* considei^able care, wl-fh an associated increase dn labour, subject and • 

financial costs. A brief description of^he* problems* associated with longitu- , 
* dinal\sjtudie8 fx>llows, along with possible solutions- to these problems, as well 

as comparison^ between longitudinal and cross-sectional designs. 

. ^ ' , » , ' • - . . / • . / 

Cross-Sectional Studies * ' • ' A < • * 

r ' ■ • • , 

Purpose and sampling . As Schaie (1973^, p. 164) points. out, valid /research « 
^ design and sound data collection methodology can be employed* only if, "the specific 
developmental question is made explicit.*^ If the sole purpose is to examine 
differences -among* cohorts at a single point in time, then a cross-sectional 
design will suffice. A major problem, however, is to obtain comparable "samples , / 
from the different age groups. If one was to. sample 20-year ol^is* and 5p-ye^r 
olds from a given communfty, there would undoubtedly be a number of variables, 
other than age, dlst^Lriguishing the two groups.- The adventurous,, or the dull/ ^ - 

or the very talented, may have left the comftmhity, thus the 50-year ^l^s aire a ' 

^ - particular residual group. Purthermore^^^i^ the sample was dr^ f rom' volunt^eers , 

not only would the eicternal validit^'^ limited, but so would the internal vali- V 

* dity. It is unlikely that Volunteers -from a' 20-year old population differ from 

t 20-year old non-volunteers in the same ^nn^r and degree as 50-year did volunteers 

differ from their non-volunteer^ohorts. Random sampling will permit comparisons 
among cohort populations within^tRte sample domain, but inferences cannot be ma^e 
beyond this population. ^ ^ ; * , ' ! x ' 



, Internal and external validitfy. . Cross-sectional studies confound fehe - 

eff^^ts of aging 'with generational effects, thus Introducing a* Sour<ie/Of error 

which may impair the^internal validity of this designC . ^The frequently held 

belief that ma^y' behavioral attribute? decline with age, after peakinf around W * 

' * * • - • * # 

age 25 was based o^T'fevictenCe gathered* with cross-sectional studies.., Stlbseqtient; ' 

' ' * '^^^w ' . ' '< ' • 
longitudinal studies (Schaie St.f other, 1968; Schaie, Labouvte^/'arik Barrett, 'V 

=JL973) have negated thisJiLypd;^]^^^^ by showing Virtually nb^bhange within T 



individuals up td ages 40 and 50., but^Qpnsiderable. between cohort differences. 

Thus the early cross-sectional studies reflected between generation differences 

and yetyexe interpreted as differences due , to aging* • . • \' 
* ' * ^ * 

^The' problem of non-random population attrltibn, dolled selective survival 
'by Baltes (1968), affects the external validity of both cross-sectional and 
longitudinal de^sisns. Evidence is cite'd (Baltcs.^968)^ that a specific popula- 
tion at say age 20, ^ changes in its composition over time in a sele^tijjf manner « 
so tb'at the survivors by age 50 are the subjects who were the 'taller i and more ' 
intelligent ones in .the original sample. With a cross-sectional design, there 
is no way to control and/or examine this phenomena. 



Design and analysis . The usual experimental design for a cross sfectional 
study is a sipgle factor, randomized gfoups design. Appropriate ^analysis for a 
^-single dependent^ vatial^le vould be a one way AJ3(|iVA, with orthogonal^polyndmial ^ 
decomposition^f the sum of squares for cohorts possibije i£;a trend analysis is 
desired. However, unlike a repeated measures design where distinct and appro- 
pxiate *rror terms are available for gach trend component, this design yields: 
only, a within-groups mean square which must be used as the denominator in all !F 
tests* Consequently, .the desigji tesults In statistical tests o& relatively low 
power, both for the main efzect and any single degree-of -freedom ^>ftrasts. 

Longitudinal Studies . ' ^ J 

0 

Purpose and sampling . Tne usual purpose of a longitudinal study is to 
examine changes' within individualfs in i:6rms physical or behavioral* develop- 
ment. Consequently, the procedures^f the past .have involved obtaining a ^rela- 

. tfv^iy large random sample at one point in tJrae followed by repeated observations 
of* the same sub:fecfs for a period ofV tim'e (a feu .months up to a life time). As 
only one c61%ort,is needed, the sampW-ng procedures do^no* have the problems *. 
cross-sectional s^die$ do in equating samples across cohorts. - If the longltu- / 
dinallstudy i^ goiog tp be a life tirae study, or aiiy considerable l^^ngth o*f time, 
then a large initial bas^ is necessary as considerable attritiQn\is likely to 
occur (whfcti causes numerous other problems). One major ^ptoblem resulting from 

^ the continuous tracking and measuring^of a large number 'of subjects is the 
financial cosjp - ^ KIH longitudinal .study which monitor ett 50 >00Q. children from 



♦ .J 



/ 
/ 



,),■■■ ' 



pregnancy to seven yearsjjld cost 60 million dollars (Wall and Williams, 1970). 

V 

Obviously any large scale longitudinal sfudy requires funding frpm wealthy foun- 
datJLo'ns or governmentai' agencies. 



Internal and eytemad validity > ' In contrast to the cros^^-sfittional studies 
which confound 'age^and generation effects, the longitudinal "study confounds, the , . 0^ • 
effects^ aging with those related \o cultural changes. Over a 20-year ^teriod . 

fevioral attiritnites show a pronounced change within sbclety itT^^nefait ^> *. -^.^ 




s show up as a change within individuals. Society's 'attitudes toMsx^ss^^^^ j^^ 
working mothers or pornography,' which have undoubtedly changed over the lasj^^^ 
years, would show up as a change in attitude frc^^age 20 to age 50iin a longitu*- 
"dinal study. Phenomena such as those cited here would probably be correctly 
interpreted; ^owever, with many other variables it is questio;iable -whether ^ny 
change can be primarily accounted for by aging or by cultural changes. ^ ^ 

/ ' ' ' \ ' ' ' ^ 

-Selective sampling, selective survival,, and selective drop-out (trermsr friom 

Baltes, 1968)^ all tend to l^orwer- the external validity'of loiigitudin^l studies. 

The population which is apt to volunteer for a lohgitudinal study tends to be 

of a. higher socio-economic status ancj intelligence than non-volutteers^ (Rose, 

•1965), and lattrition from such studies is"^ alsa selective in ^hat those, subjects 

dropping out (both refusers an4 movers) tend to be of. lower intelligence 



(J^bouvie^' et al.^ 1974: Schaie, et al.., 1973)/ ' 



A third problem- associated wijth longitudinal studies, and one that is not 
.present In cross-sectional stu^l^ies. Is the repeated t/sting^ef f6ct. •Labouvie, . ^ 
et ^al. -(1974, p. 202) conclude that^ "... the findixigs indicate that age-rei^terf 
longitudinal increases on intelligence variables are mainly due to retest effects." 
They! fiei*^ that th^ internal validity of' simply longitudinal studies is lowered 
to such an extent by repeatedVtesting effects that, any inferences about age-related 
changes a^jre **iinjustif i^d and grossly PTisleading''. There are two ways an investi- 
gator-can t!est for and/or control- for this testing effect. Schaie (1^73) suggests 
retesting a subsample v/ith^n ,a relatively short period of time, before any age 
<ox envi^^ntoental influence^ ar^ . likely , to have taken plale, and if there is.no 
change at 'this time, then the .researches can be confident tliat any differences 
in a year will not be due to the {resting effect. If there are differences, then 
it- will be necessary to utilise tl^. other procedure, 'the iotroduction of a control, 
group, which is discussed in the section on mixed designs. * '. 
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Design and analysis > the post common method of analyzing longitudinals^ 
data Is to treat It. as a single factor, re'ffJSated measures design (or,' If two dr 
more groups, a k x p faqjtorial experiment with repeated measures* on the second 
factor, where k Is the number of groups, and p Is/thejiumber of testing sessions) • 
The nature ^nd degree of change oyer, time can then*.be tested for statistical 
significance with eithef MNOVA or a repeated measures ANOVA* - the advantages' 
ancl disadvahtages of these two ijathods is discussed in a subsequent section* Trend 
analysis, a very powerful statistical test wl^h a "repeated measures design, provides 
an IndicatiAa of the s.ignif icance of any polynomial trends-over testing sess:^ons, 

Bentler (1973), Nunnally (1967), an<^|others advocatle the use of a factor 
analytic technique to analyze longitudinal Sdata. This procedure transposesythe 
subject by test data matrix into a testing .session /by subject matrix and factor 
analyzes that, giving factors of people, each subject having a loading on each 
factor, jif there were three factors, this wpuld represent tjiree different patterns 
of change over time. 'The problem with such analyses is -that tt^ rest.s on tKe as'^ump- 
tion that individual differences tn change* can be grouped into types. It i& this 
Investigator's opinion that most deferences in c|i^nge over time among indlvldu^als 
are a matter oLde^ree, not' of type. Consequently., the factor sqlutions woul^d * 
not be very distinct.^ ^, < \ " . 

. Other less common' procedures ^such -as^ progressive partialipg analysis (Nunnuliy, 
1967)>NStochastlc processes (Schutz, 1970) ^-^and time series (Gottman, .McFall and * 
Barnett, 1969) have potential as valua|>le st^isrical tools in explaining varia- 
bility in patterns of change. . . / 

Mjyed Longitudinal Cross-Sectional Designs \ \ ^ ' * 

It has been' shown that the two commonly used designs In studying develop- 
mental fchange both confound a component wi^h the effects of age, longitudiaal 
studies confound-^age and* environmental or cultural effects, and cross-sectional 
studies confound ^age with^generation differences. ^ third design is thejtifae-t> 
lag study in which one age group is examined longitudinally, that is, a different 
sample of say 10 year • olds are selecjted and tested every 5 yfear^. This design y 
then, while not even accounting for age, confounds generational and- cultural effects. 
The obvlQ\is solution is to combine all thrfee designs in an attempt to remove the 



c - 



* confounding effect?. Schaie (19^5)^ attempts to- do this with his/ trif ^ctor 
developmental model - a sequential' research design which attempts to separate • 
the effects of age^_cohort, and time of measj^rement . * The agd effect indicates 

'maturation of the" individual, cohort effects should indicate^her^dltary effects, 
and time of measurement effects are ^indicative ofi changes du^ to environmental 
'effects (although B^tes, 1968, suggests that the cohort component/^may.also in^ 
elude environmental effects)." Table 1 represents this sequ^ential design. Note 

^ that^ the three rows represent three longitudinal studiesf^, the columns represent' 
cross- sectional designs (although-^o^l^ cpluiiin 1&60 samples all four cohort groups) 
and each of^the four diagonals represents time-lag studies/ Schaie formulates 
three .equations, based on the premise that differences between cross-sectional ' 
measures^ between longitudinal measures > and between time-lag measures, are •each 
a sum of the two components which are confounded in these designs Through L. 

, processiof subtraction, he can then get independent estimates of each of the 
three components, age, cohort, and time. Such a pijocedure, however, requir'es 
six subsanjples in order to get these three independent estimates. The design 
represented^Jji Table 1 would therefore not be sufficient, and would require 
cohorts at'. 1970 and 1980, with testins c^)r>tinuing to. the year 2010 in otder to 
get complete 30-year longitudinal data on 6 cohort groups. >' 

[Insert Tabl? 1 about here] ^ • \ 

^Baltes (1963), while acknowledging Schaie 's contril)ution to methodology^ 
•in cfevelopraerital rasea^ch d^ign, raises two objections to the ^trifactor^ model. 
The first objectiqin, cyfertainly a valid one, is that 't;he three* components, age, 
cohort 'and tinted are nojt really mutually Independent. Any one component cart be ^ 
replaced by a linear combination of the other two,^^hus giving' rise^ to Baltea' 
(1968) bifactor model of age and cohd^. The second objection raised by Baltes 

* concerns Schaie' 3 definition of the variation accounted for by the time of . ' 

measurement cpmponent. The effecCs of maturation* and environment cannot be 

* » 

isolated ^through direct measurement^ causing the time copponent to beacon-* 
r ' * * . ' ' t } ' ' t . 

founded variable itself. ' . . 



ie\est 



Using Baltes' bifactor model as the l^est available research design for 



development studies results" in a classical p x factorial design with repeated 
aeasures on thV second, factor (p being the number of cohort groups; and the'"^ 
number of different age* classifications under wKich each cohojrt group is tested). 

ERIC . • • . . 8. ... ' 



Such a design can be analyzed by the tepeated measures -analysis of variance 
given, in Table 2, This^allo^s for an analysis 'of Che age effect., the cohort 

[Insert Table 2 about here] , ' ' 

r ' * 
effect, as well a^' t'he interaction Vhich tests if ^the change over a^^ is constant 

across €he various cohof t levels, , Further polynomial breakdown on both the age 

and the cohort main effeets kre possible. - . 

The bifactor and trifactor models of Baltes and Sfchaie, although accounting - 
" ' ' ^1 . ' , - - 

for* age and cohort differences, still do not control, for, one of the. major sburcesv 

•> ' " ^ ^ . - * * 

of invalidity in longitudinal studies, namely t\\e effect of recreated testing. ^ 
Both investii^tors, however, have made suggestions for testing -anti/or controlling 
fo¥ this effect. Essentially, .these controls entail a separate control group ^ 
for each cohort and age level. Thus, if the original cohort of' 100 five year 
olds was to be tested four tirae^ over the span of the lonp,itudinal study, *lt 
would be necessary to obtafn four more groups of lOOj f ive year olds, or, more 
practicaj;^ly, to subdivide the original 100 into five groups of 20 subjects each. 
Group I fs tested at each testing session as in the* usual l5hgin:udihal design, 
Group II is tested at time two and £hen discarded Group IIJ is also tested only 
once (time three), and Group IV is not tested yntil the fourth and final testing' 
session. This desigtr, and a possible statistical analysis, are' given in Tables 
3 and 4. ' . ' • ^ 



V 



[Insert Tables 3 atid 4 about here] 
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The AllOVA table for this design is admittedly rather complex. If the 
design i;s considered as a 2 x 4 x 4 factorial experiment wi^h repeated treasures 

on the least factor, then the ANOVA table becomes more obvious. (The three 
_ L ,1 : _ ^ ^ .._ 

factors are: Practice r No Practice with 2 levels, 4 cohort groups within each 
^ > k . t 

level of Pj and 4 age levels.*) The probl'em is that tRere are repeated measures 
\ \ . ' ' • { 

under Pj^ t)ut not under P2, thus the difference among cohorts within levels of 

P are kept separat^ and different error terms >are necessary to test ^Ohes^^f t^|s. 
Neither Baltes (1968) 'nor Schaie (19d5) provide adequate desxiriptions af suitable 
statistical analyses for their designs. Balteis discusses it in a (general way, 
and Sch«W.e presents an ANOVA table for a complete factorial experiment wlth.a ■ 
Xandonized groups .desi.i$n. Failure to account for the repeated measures aspect ^ 
of this design seeras-To be a ^serious flaw in Schaie '^analysis. * 

• ■ ' . : 9' 



. . • • . 

It is int;eresting to. note that jthe well- knowm Solomon Fou^-Grc^'vdesiga 
(Solomon, 1^49: Solomon and Lessac» 1968) is verjf similar to these cross- 
sequential designs which control for the testing effect. The primary diffM- 
ence is that Solomon' s" designs are pre-post only, rather' th^n longitudinal. 



The Attrition Problem in Longitudinal Stiidies 
^ , ^ *>-^ • ^ , ' 

•A serious problem confronting all researchers involved in longitudinal 
5^ ^ , _ * ♦ 

studies is subject attrition, whether it is movers, resistors, or deceased- 

* * <^ « - . • 

^subjects. The two main concerns of * the investigator ar-e; howVah missing' s.ub- ^ 

jects be retrieved? *and what statistical procedures are appropriate for Ifepeated 

^ measures designs with incomplete data? 



J 



• Retrieval procedures . McAllister, Butler, and Goe (1973) provide detailed 
procedures 'for relocating sub ject^n longitudinal studies. Thei^ accdmpanying 
flow chart is a virtual recipe of -&tep-by-step procedures. Their, strategy was 
ut^-lized in 1972 in an attempt to locate a random, sample of 600 subjects from 
a sample of 2661 original "participants , in a 1963 survey. The 1963 sample con- 
sJrstdS'.of 9 to 14 year olds, thus the 1972* satapler ranged in age from 18 to 24 
years - ^ very mobile group. Despite this, and the nine year time span., 
McAllister and his coworkers were able to trace over 901 of the 600 subiects. 
County marriage) records. Postal Service back files, telephone directories, ^ 
criss-cross directories, County Voter Registration files, school transfer re- X . 
cords, *Public Utilities Credit offices {which axje considerably 4:heaper- than ' ' 
the often recommended Retail Credit Unions),- and State Departments of Motor 
Vehichles all proved to be Useful information sources. • ^ . 
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Statistical analysis ; Attrition is ^ot a serious pjroblem in thpse'*des3>gns 
which employ concomitant control groups. ' Howevdl^ the majority of longitudinal 
studies^resently underway probably are of the simple basic design, that Is, a 
single group of individuals has been tested at time zero and then observed and 
tested at ^regular. intervals for a number of years following. By the end of ^ 
year five it*is quite .possible only 75% of the original sample remains, and, 
to further complicate th§ analyses, replacement subjects have be^en added in 
an attempt to retain a relatively stable sample size. Assuming . that the inves- 
tigation, involves mo^e than one dependent vatiable. and that the reseaT>cher 

* » - 
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wishes to m'ake , statistical statements regardifig tl\e probability of significant 
changes while .maintaining a relatively lox7 experiment-wise error rate, then ^ , 
mult:^variate statistics are necessary, llA^IOVA being the most appropriate tech- 
nique in njjpst cases. Und^r these conditions there is only .one option - delete 
.fiTOin the statistical analysis all subjects £ar which thefe is not. complete 
data. It does not matter if there are unequal numbers in the dif f^erent groups 
(cohorts, or an a priori classif icatd6n variable), but each subject ihust have 
a complete set of . scores (i.e. ,^each Variable at each measurement period). It 
is as straightforward^ and unequivocal as that - d.elete all subjects with-- in- 
complete data. This applie^oiily ' to the MANOVA analysis. Ther^ are a number ^ 
of ways by which missing_data can T^e replacejd with estimator^ (i.e., Fran6, 
1976), but the basic assumption' underlying aU. such methods is that thfe data 
are missin<> at randora. As this is not the case in most longtitiid^inal studies, ♦ . 
sudh^procedures ai*e invalid.- . ^ . . • - ^ ^ 

-Additional valuable information can be gained by comparing the variable 

means'at t^me zero /for the partial -data subj,ects with the 'tomplete-^-daJ:^ subjects • 
This, of c^^urse, tells .nothing abo,ut development, but it does provide, an indi- 
cation of 'the extent to which the'MAHOVA results can be generalized to' t^e__ 
initial population. The a4ding of subjects to longitudinal studi^ af ter^e p 

initial measures have- been taken is certainly not recommended. Asf welTlfc^ 

. . 1 ^ 
.the problem^ of -incomplete data^ there are also problems related to differential 

testing effects, . and^elect*jCe^sai4f>ling.. ' 

' ( THE USE OF DIFFI5IIEIJCE -SCORES AS A MEASURE OF CIIAI^GE ' \ ■ 

In a typical pret^est-posttast repeated UTeasurea design, the resultant 
difference score^ or gain store^ is usually of primary interest to yfe' reseg^ch^r 
- despite its well known and frequently documented associated statistical pr6bl^ms 
Objections to the use of difference scores have been made by methodologists for 
many years, were clearly defined by Bereiter* (1963) approximately 15 years ago, 
and yet,^arc--«t ill' being made and debated today (Levin and Marascuilo, 1977). •> 
The following section examines different methods of compulsing criterion, differ- 

enae scores^, and ♦some possible'adjustment procedures, and ,outline$^ the basic 

' ' ' <» * 

problems associated with the use of such scores. v • 



Svrlaction of a Criterion Score (quad Just eJ ) ^ ^ / 

' if the research methodologx x>t;,iliied ^Vields a.single score on the 
* . ^ • < . * - ^ i> 

first administration *o£ a test (X^) . and Another ^ jingle 'score on'a Mpe*- 
tltion of tJ^at test at some subsequent point in time (X2> i then th^re is 
little choice isi. the criterion score to use if. the resear9he,r wishes to^ 
use a* single^ unadj^usted, depend^It yariabl^.* If. has t'o^be .this difference 
\l) «=,X2 -'x^) - which has many ir^hetent deficiencies and n^erous possftle 
transfciunatiorts to reduce th^se deficiehcies (none of which are very , 

satisfactory).. These are discussed later/ A^mpre likel>fc situa- 

' I ' ' ' ^ 

tion, however, is when there are a number of observations available for 

# • '•^ 

eac5h S .(e*S*» heart •rat.e at ^ach minute of »a*15-mlnute exercise bout, ,30 

* * *i 
learning trials), but the Investigator wishes to reduce this data to a 

single change score or learning "score • The^problems then confronting him 

^r^: (i)^hovr many trials should he yse to ^tlmate bbth the^nitial and* 

Sinai states of the Ss?, and (2) should he use the best, or the average, 

of each of't>.ese sets of trials? Before commenting on some possible 

^ . • . »^ \ 'V ^ ^ ^ . - 

solutions to these t\io problems, it sHpuld be noted that neither 'of tljese 

\^ \ ^ . / - ' * 

•problems should ever arise \;hen_4ealin8 with^tdie analysis of change* 

Discarding' or reducing data, when .4ui table stafistickl methods ^re available 

for analyzing, all available data, seams like very inefficient research. . 

If the goal ds the be able to understand motor behajjTor, for purposes of 

explanation and preditlT^on, then one must look at all the data, and . analyzie* ^ 

- it ^y a^'repeated measures 'AWOVA, time series, or some other equally suitable 

,toor. HoVever^ many investigators insist pn obtaining* a' single change „ 

score, thus .some discussion on ttxese points seems necessary. ! 

^ ' The problem of chbosing letueeYi the best an'd^ the average 'score has 
only one, acceptatjle solution - use tiie average. ' There is suf |^icient 
isuppoft for use of the ^erage rather than the^bast an the general case 
(Baumgarther^^ 1-974; Henry ,^^965, iCrolT, 1967) and In the specific case of 
difference scores it is even more ^necessary. Tho^ reliability of a differ- 
ence score is so "dependent^uppn-^the reliability^ of the tn/o scores v;hich 
produce this ^difference, that it is imperative that these two jscores 
ppssesfe maximum reliability the&selves -.thus averages are nekjessary^ 



The .solution to' the tjuestion &L the optimal number of trials to use ^ ^ 
in computing rtiese .pre, and post-rscore averages is not qditfe j$o u^|ibbiguous., ' 
The problem facing an investigator who uses a learning task is how can he 



chooser a score v;hich maximizes both reliability and discrlminability at 
the same time? In a tasl. which has, say, '20 trials, the difference between 
trial onetand trial 2Q will probably show the greatest discrlminability' 
as far as learning is, concerned; hov4ver, it may not be very reliable. 
If pne uses tVie average of the first /en trials as an indfcabion'of initial 
score, and the average of the last ten as the performance, score, then the 
difference between these tv^o may show high reliability, but it probably 
v/ill not show much le^rnin^T^; Carron and Marteniuk (1970) pointed out the 
necessity for- coraparing^-the differences between both thfe reliabilities -and ' 
discriminabilijty obtained by 'grouping trials^ in differei^t ways. Others 
(Baumgartner and Jacl./on, 1970;' licCraw and McClenney, 1965) have attempted 
to give definitive rules for determining the number of trials and thd * 
measurement schedules one should employ. Because of the great variabilij:y 
in type'of task, characteristics of Ss, etcl,' it does not seem possible, 
•to choose- a specific rul^ for detemining, the "best" criterion measure 
for all situations - even for all situations involving a specific task o^^ ^, 
set erf measures. J lf one decides^that it is necessary to reiiuc^ the data 
to a single dependent variable (which, to this writ^er, -does not s^.em t;b 
be & valid procedures), then utilizing proceJuxeg-as suggested by Carron 
and .llarteniuk (1970) ; and followinR? tl>e'tasic •princft>ies of reliability 
and validity of dependent variable scores- v/hich have befen frequently and 
explicitly laid out for, us (e.g.; Alexander,, 1947; Burt, 1955; Feldt and ' • 
•ftlcKee, 19^57; Kirause, 1969; Lomnicki, 1973; Schutz and Roy, 1973) one should 
be able^TTarrive at a proccdu^ft for selecting the inost suitable criterion 
«core in each' specific situation. , * ^ . _ 

Selection of a Criterion' Score (adjusted) . , , ' 

— ' • , \ ^ 4 

In 'ssftuat ions* where there are only two opportunities for observation' 
" and measureiient (pre and pbst), or wh^re the investigator, insists on re- 
ducing repeated measures to a pre-post ca-sev then it .is_£roba6ly necessary 
to apply some type of statistical adjustment or earrection factor 'to either 
th? difference scor-e or to the final score. The following'^section gives' 
.possible solutions for each of a number of copnon.problens' associated with 
, "iisin?, difference scores. ' X3 ' " 



These problems have been well-defined by mahy investigators XBereiter, 1963; 
Cronbach and Furby, 1970; Lard, 1956, 1963; McNemar, 1958).. 

(i) Prqblem 1. Regression Effect: In general^ on the second admini- 
stratiqh of a.;>^st, and in the absence of any true change or treatment effect 
the observed^eifees for Aose v7ho scored high on test //I tend to decline and 
the observed scores of those who scored lowest on test Jfl tend to increase 
on test ff2r\ ' • . » • 

' ' ' ' . ^' 

Solutions • The;ni(Dst valid, a.nd least- complicateji, solution, .is to usfe 
a homogeneous group so all Ss have, essentially the same initial score. IT 
the experiment, involves comparisonse betweep group^, then equate the group 
means initially/ either by randomization vdth 3-arge sample g^zes, blocking. 



matching, »or statistically through analysis of covariance- 



■V 



Another possible solution, the one to which p6ychometri*cians .have direc- 

^ted their attention, is to^adjust the final scor'e'on the, basis of the- prer 

post linear regression* effect . 'This can be done by fitting a' regtession 

line to the pre-post scores (Xi, Xo) under the conditions the null hy-' 

pothesis; i.e., no treatment effect, apd then , use deviation from* the 

regression line as' the dependent variable indicating true chk'ng^ (Lord, ^ 
** * " ^ • ^ ♦ 

196^). This requires either a separate control group or a (Xj, X2) measure, 

for each subject under, a treatment -conHition and a control condition - a 

procedure which is no't always possible. The most reasonabler-solution seems 

to .be to \ise "analysis of covariance (ANCOVA) as it is essentially an analysis 

of the X2 scores, ^^dfusted on^thc basis of the regression line between ^2 

and Xi, ^ • 



' (ii) Problem 2. Measurement Errors or the Unreliability-Invalidity 
piiemma: The nlegree to which measurement errors exist in the initial and/ ' 
or final 4|easu)?es, along with the degree to which th^'Xi, X2 correlation 
exceeds zero, is reflected by a reduction, in the reliability of ' the, XirX2 _ 
difference score. 

Solutions. There exists a^ wealth of information on possible solutions 
to this problem (e.g.; Lord,' 1956, 1963; McNemar, 1958; Ng, 197^; Tucker, ; 
1966; WUey and Wiley, 1974). 

' 14 
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The bas,ic thesis of P^^^^^^S??^ is possible "to coEftpjjte ' . 

.a reliability coef f icient^ V^^'tW for attenuation', that iSy the're- * ^ 

liability of a difference between 'true scores', (errorless measurej^^ yielding 

reliabilities of 1.00 in both Xj^^ and in Xj) Once having obtained a^^' ^ ' ^ 

reliable estimate of true difference it .is then possible to. use. £Ws^^;^ 

attenuated^ reliability poefficient and multiply it by the observedtX^l/U-: • 

difference (but scaled as deviations froni the means) , thus obtaining a ^ "I 

hypothetical true diffoience scoXe or ''rsgre-jsed score" (McNemr, 195&). 

Although this^ is the basis of tpe solu\4^^,...a4TO by many^sychometri-^ * 

cians it Uas its Kief iciencies', the prlSlja^y one being that -the number of 

alternate \7^ys to compute this true gain spore seems -to be exceeded only 

• by Ihe number ofi^aperg written on th^ topic. The non-specialist is left 

^ with a morals of eouations and confusion. Another deficiency with the 

/ ' ' ' 

use of estimated true difference scores fs that the regression coeffiirient 

Used in the predl^ifor equation is based on a numtrer^pf assumptions, some 

.of which may not always hold true. A recent report by Uiley and VJiley ^ ' ' • 

(I974!) indicates that the assumption of independence, of erro^ of measute- 

ment between tests is frequently violated^ thus giving over<SBtimates. of • 

the attenuated Sr^iability coefficient. This in turn would result in. 

overestifnates of the true gain score. ^ 

' • I 

Xili) Problem,. Sv' Equality of Scale Alonp, the Range of Scores (the 
Physicalism-Sui?.1e^yism Dilemma):' Ah observed score at. the low range 

- of the continujM^ be measuring an attribute of behavior quite different 
from that whi|P?s reflected by the same test at the high end of the range- 
of scbres.. > > . ^ 

-Soliitions: There seem to be np "adequate solutions per se foj: this 
-problem.' One could .u^e>-technique» methodology (a sort of. factor analysis 

- appropriate fot. change data) to tdst the assumption that the' two. measures 
are in fact measuring th^.same thing (Bereiter, 1963; Qittell, 1963); - 

„ 'However, t\is- is not a solution, but r^ather a techniqiie to reveal the , • ' 
. existence or non-existence of a probleni. -The answer seems to be^ in \ , • 
finding ways to avoid the problem father than sdlve it,- and this can : 
be accomplished to a^ limited degree. If allVoups are equated" initially 
wi^Prespect to theit 5cores*on the dependent variable, ■ then any di^er- 
. ences between groups in the amount, of jchange within groups can be, logically 
interpreted. (Schmidt, 1972). , ; . • - - 



This .restriition allov^s for tVe coivdlusioh that one group changed more, or 

* - I . . . . 

lass, xjith [regards to^ the partidul^^ dependent variable being used. If on^ 

f * * * • . 

group showed very large . changes*, and the other group very^small ones, then 

it may be /difficult to interpret tfie meaning of the relative magnitudefe of 

chaTige scQrjes, but it .still possllble to state that one group should 

/ i ^ ' ■ * • • 

signif icaptly greater change than bhe other. group ou that particular trait. 

I 

A General Solution to tjie ProbleQs Associated x^itli^^Dif ference Scores 

^ ' > *\ - ~ • ~ » . 

At tills point the teader must be uondering, "Is there noe adequate solution to 
the problem of measuring criange?:" Hy ansvmr is ''Yes there are adequate methods, 
but not through the use of .difference 'scorete. If one must use a change score, - ♦ 
then perhaps the' "best" estiji\atar of a true difference score is Cronbach and Furby's 
"complete estiin4tbr" (1970): ' . * \ - 

D = -Xo 

where D is the "true difference score"*, and Xi is the true score at time 1, taking 
into account nunerous other categories of variables,- W, which'-'may be 'multivariate 
in ^nature and relate to the pre or post scores in some manner. Shja true score for 
Xj^ 'is estimated .as: ^ ^ . ' . *u>, 

' aXi^(X2^Xi)_ aXi^(W-Xi,X2) ' / " ^ 

« pxx'Xi.4, ,2(X2.Xi) ^^2-Xi) +:,2(w.Xl,)(2) (W-Xi.Xa) > cpnstant ^ 

where ^^j) 4 ^1* ^l) partial variates. The purpd^se of presenting 

"this eqaKion.i's act to'pfoVide the reader with a u*seful statistical tool, but 
rather to po jnf out the extreme Segree to v/hich the rav; data can be transformed if 
orte wishes som$ sort of pure measure. The diffici*xty in interpreting this trans-- 
formed score is obviotis ac least in terms of predictable observed J)ehavior.^ 

- i - ' ^ , . ^ . t 

T\^o quotes pro'^fede a suitable suiiimary. of '^this investigator's position on the 
use of difference scores: 1 ♦ . ^ ' 

. V ■ / ■ : ■. ; - • ■ 

"Both, the history of the problem apS the logic of investigation \ 
indicate that the last' thing one ^ants to dp j^s think in terms of ^ 
, " or compute sufe^ changje scores* unlegs the ^.problem makes it_, ali'solutely ^ 

necessary." ^' <nunnally, 1973, p. 87) * 

"GaJLn scores arfr rarely u^ef ui^c^^idjjnatter hox; they may be ad- 
justed or refined." (Cronbach and Furby, 1970, p. 6<?) ' \ 



The Statljstical Analysrs. of^^Dlf f erence Scores 

» 'Given a single group'y pretest-post te^st design, there are two equivalent 
. wayS'td test the null hypot^esfs-pf a zero mean difference, namely a t testi. for 
correlated means or a -one-way rejJeated measures ANOVA (the F ratio of the ANOVA 
will be ddentlcal to t^). Of conc^rA here are the cc^n&equences of th6 unreliabiiity 
of the difference scores. * ' ' ^ ' ^ 

The measurement specialist is primarily concerned^ about reliability as a . . . 

phenomenon in itself, placing high value on reliabilities near 1.0 and showing.' 

V ' ' ' • ' ' 

• abhorrence at values' of Jess than . 50,. Assuming that the-reliabiliti^<o£ the^'O * * 

- ' . V • . . • • : ^ 

pretest ^nd ^posttest are the same (r), and given the correlation hetween pretest 

y - ^ • - . " ' - • 

and postt^st as ri2» then the reliability (rj) of the difference ,:score is: o 

•■• , ■ . • " 

Thus |s rl2 approaches 'r%^ the rel^abilj/tj^^o^ flie difference score approaches' zero 
In order to attain a high xa,^ '€hQ-ia^gnltu^e of ti^ must -be small related to r: 
i.e., if ri*2 = -25, r - £hiarf = .67. "^However, this does.JLittle to'appease 

the measurement s^^ct€Li^ as an ?;i2 .25. suggests that the test is not measuring 
,the same attribute a* 'each point in time.. Consequently, the researcher either 
avoids difference scores or attempts to "correct" thert as discussed earlier in 
this pap^.' . ; . - - 

The statistician, on^ the other hand views low reliability *in difference scores 
with fewer misgivings,' because as this reliability decreases; the power of the 
statistical test in«ease's. ^s is shown above, it^s the^altie^ft|^1^^hich .is of 
importance (for a fixed value of r)^ This can be demons trat^^^rk^.^TO ANOVA , 




and t tests. In^the latter, the. denominator approaches zero'^i?^^ aj^^oaches 
1.0, thus minimizing the denominaVo^ and maximizing the calculated t. Fov a cqpmou 
.y^riance (Si? « 82^ = S.2) and ri2 ^ 1.0; ; - * « 

2 



. Si2 ^ 52^ \ ?ri2 Si S2 ^ 25^ 2S2 ^ 

^^^^imilarly fot the F ratio in ANOVA. As^ ri2 approaches "th^ Subjects by Trills 

interaction approaches zero^i thus maximizing the F ratio'^for * the Tjrials effect/ 

Thus, ^although the tiliability, of t'^ tests themselves sWld be important 
Q to researchers, the reliability of " the difference scores may not be that crucial. 



. « ^, ':;••■• ! '' ' * ' ' . . ■ ' , . ' , 

-f ' UKllVARlATE ANOVA MID MAWOVA ■ . 

.. \ ■ . > . 

The analysis of >all of, the available dsita should provide aji Investigator with 

more information than doBs the limited, and suspect f'' information provided a 

difference score. "Hiese rSp^at&d measures analyses may be performed by either^ 

univariate or multivariate analysis of variance (ANOVA, MANOVA) on the raw scorep 
/ ^ • < * ' 

ot on scores adjusted for initial differences between groups. The more informa- 

tion ay4ila^)le on the naturp of change in behavior over time, the greater should 

be th^ decree of understanding of the. nature and causes of tHat change. Conser 

quently^.An an experiment invdSLving -any length jjf time between the initiation of 

If ' . ' ' ' : 

the/ treatment a^d the final observation, it is desirable to take numerous measures 

pot S. Although in some cases it is^'uot possible to do this; 'e;tther duQ to the , 
/ / - . . - . ' \ ' ' ' * » 

ohtamination effect., of „the^ measurement .tool or to the nature of the treatment 
\ ^ ' " * ' . 

procedures, in most motor behfivior studies such repeated measures are quite 

feasible. / i / ; • ' . ' * 



easlbie. / 1 

7 . ' 



Rfepeat^d Measures ^AtJQVA ' ^ 

The common method for analyzing change JFor a repeated measures design is* 



through a repeated measures ^or Ss x' Treatments ANOVA. Given a tjrpital ex- 
periment involving two' treatment groups (or a treatment ax^ cotftrol) \rith 20 
- Ss nested wit^l^in each group and repeated^cross say 10 trials (jpig. 1), one 
appropriate method for analyzing change" jcould bt to break down the total , 
^ variability as given in Table 5. " ■ } 

[Insett Fig. 1 and Table/5 about here] 

The effects ^f most interest here, v/ith respect, to the analysis of chajige, 
• are the Groups Xs^xtiais a.nd its trend analysis co^yiponents, Groups x Trials 
. (Linear) and Groups x Trials (Quadratic!. The Groups x Trials interaction 
indicates the degree to which the change over trials is the same for eafch 
.group - whicK is probably the research question of- most interest; i.e., is. 
there a significant change in'^belmvior over the time span of the experiment, 
and, If so, does .this change *show the same, or- different, characteristics 
between the two experimental groups? • 
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The Groups x Trials (Linear) asks ^essentially the same question but with the 
constraint that the change, over time Is linear. In thls^ase a linear function 
Is forced on the data and the test of significance testd, for equality of slopes 
between the two groups, which In behavior^ terms amburits to a comparison of 
the rates of learning, rates of recovery, etc. Similarly the Groups x Trials 
(Quadratic) compares the two treatment groups on the basis of the degree of 
curvature or time of plateauing cf\the scores «t>^fer t^iiae^ ^ * i 

This. analysis then provides- -oi)e possible solution for the analysis of < . ^ 
change* suitable for many ext)erimentai conditions. By using a numbjer 'of measures 
instead 6f just two, the problems- of regression effect and measurement errors*'; 
are greatly reduced. The ur\reliabiLlty of ^the data Is^^f Ifected, by the magni- 
tude of the S X Trials interaction (or in this case the.S(G) x T) and is thus 
a sort of buiJLt in protejtion against making erroneous research conplusionp^ 
based on unreliable data. The;less reliably the' da(;a| is , the larger the' 
S'x Tri'ais error terra, the more difficult it' is to attain statistical signi- ' 
ficance ai^d the- less likely it i^s to make.^a Type I erro^:^ , ; . ^ 

The repeated measured ANOVA' is not the ideal solutiom to -the. probl^mfe of 
analyzing change, however, for a number of reasons. Firstly, the tests of |^ 
significance give limited information regarding- the natute or f orm^ of 'the v 
chcftige over time, as the trend analyses^fit only polynomials to the data,* 
data v^hich iq frequently better fitted by a logarithmic or exponential^ func- 
tlon# Secondly, it deals with mean values only and^S^oes not revtal reliable 
differences between subjects (within the same £rpup7 with respect in lYitra- . 
^individual behavioral changes over time (a stochastic model would detect this). 
Finally, and perhaps most importantly, the nature of the data common to most ^ 
studies in motor behavior is such that.it violates the assumptions ojl which 
the repeated measures AITOVA is founded. These assumptions are thi^ the 
measures (1) ate nofenally distributed, (11) exhijbit eqtial variances under all 
treatment conditions, ;and (ill) have equal covariances between all treatment ^ 
kirs (th6 precise mathematicial assumption is that all covariances ecfual zero ^ 
but the P ratio is virtually unaffected by violation of this assumption, pro- 
viding .all covariances are equalV* t^hlle the first two of these assumptions 
are usually met with motor performance data, the third one rarely is. . 



This assutaptlon can be casually tested by examining the correlation matrix o^ 
-•the repeated measures - tlie degree to which all correlations are not equ^l 
.indicates the degree to' which this as^sumpt ion is violated*^ It* is 'frequently 
tbe case in*our "field of study to obtain data -in which adjacent trial correla- 
tions are very-Jiigh, but diminish as a*f unction the number of intervening 
i observations *bett;een. any two measures. The resultant of this situation is an 
inflated F value and a substantial increase in the probability of (roramitting a* ^ 
Type I error (as high ^as p « .15 when assuming a p = .05), . ^ 

^ The analysis of .v^ariance for repeated measures^ which was first presented 
here as a possible.- solution to some of the problems inherent in the analysis 
of change, has now become a problem ^H^elf. There are two possible ways by 
which MOVA may^ be validly used .on repeated measures data whieh exhlbfe^ ^ 
\inequal between tri^X- cor relations: * ' 

. (1) Inflate the magnitude of the F needed for 8ignif'ican|:e by reducing • 
j_ the' associated degrees of freedom (d.^). Box (1954) has suggested 

that the d.f*.*? for' both th^ numerator and denominator be multiplied 
*c by a factor e> which is a function of the degree of heterogeneity of 
* * both the variances and the covariances. The greater the heterogeneity 
the smaller the" calculated e and tfhe larger the F value must be in, 
' order to "reject the nulj. hypotheses;, 

- ,(2) 'Greenhouse and Geisser (1959) questioned the validity of the estimator' 

z and its effect on the approximate F .distribution. They suggested . 

the use of the minimum possible value of e, namely l/(k-.l)_- where k 

. / , ' ♦ * - \ 

is the number of levels of the repeated factor, as the factor whJLch 

^ . should be applied to the^d.f, in all situations. Although this is a 

statistically'trali4^technlque it is vefy conservative ^ thus resulting 

' , in a rather large probability of committing a Type ^11 error. , 

-There are a number of excellent articles* available which provide a lucid ex- 
planation of both the prorblem and the merits of these solutions (e.g., 
.Davidson, 1972;. GaitQ,_1973; Gaito and Wiley, 1963; McCall and Appelbaum, 
,1973; Mendoza, Toothaker and' fiicewander, 1974). 



IProcedures for stati-SJrlcal tests of 'thi^ assumption are available in 
O • Witter\(1971, p. 594). nrx ^ ^ , - 
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Repeat ed Measures Wt^QV k ' ' • * ' • ' • 

The^other solution' to the problem of non^-homogeneity of covariances is'' to 
use^a techniqiie which does'^ot^ require this assumption - n^ely the multivariate 
analysi3 of variance. MANOVA requires no as^sumptiorif regarding the homogeneity 
o£w.covariances and allov/s^^for an exact statistfCjaH- test based on a known sig- 
nificance level/ Although this technique has 'Keen available fo^many years, 
it ha6 not been adopted by practicing researchers du'e to its extreme^ computa- 
tional complexity. However, the present accessibility, of suitable computerized 
multivariate statistical packages at most universities has eliminated *such an • ^ 

^ ' y 

^^xcuse for ignoring this very useful test and •it should nov7 be a standard' / 
statistical tool for all* researjch^rs. Very briefly, .vhat MA^OVA^does' is to 
transform the k repeated measuifes for each subject into" a set of (k-1) scores 
through, t>e application of independent contrasts (these are usually orthogonal 
polynomials, .but tliey'need not be as tHe resulting significance test- is inde- 
pendent of the choice of cont/astls)^, Aiiamelyeis of variance type procedure 
is then 'Carried out on .the vector o^f means of these derived scores with the 
mean square errort being a (variance-<k)variance matrix^ of within cell variabilities 
rather than a unitary scalar yalue as in the uriivariate-procMure* The test^ 
of significance provide an F fatiq* for the overall multivariate hypothesis, 
that the trial means are equal, and f (>r aVtwo group, experiment, thaf the change 
in performance acfoss repeated measures >ifs the s^ame for each group ^. An overall 
significant F on these multivariateTiypotheses aHov7s the inveatigator to use - 
appropriate follov?--up tests while mainta ining a n overall pre-determined level 
of signif ic^ance. These folldw--up procedures cai^ talj;^ the form ^of simultaneous 
Confidence ^intervals, step--down ifatios, or even the usual univariate t tests 
on each dependent variable sjaparately or on the single d.f, conti^sts associated 
with trend analysis, Osee Spector, 1977, for a gopd. review «f' procedures) • 

- ■* .-^ - 

Another frequently used >-procedure, associated With IJANOVA'is discriminant 
analysis which tests whether two or more groups can be^ignif icantly separated 
on .the bases of their profile^ (or, in the RM dre^gn, their pattern/5f change 
•over time). It ha^ iJeen shovm, hpwever, that a Groups^'x Trials iiNOVA: is nrore 
versatile in detecting the nature of the differences between group profiles 
than is discriminant analysis (Thomas and Chisspm, 1973)ri Although ..?homas. 
and Chissom failed to con^der the restrictive assumption inherent in the 
univariate G x T MOVA, this is nofa factor if the' Trials effect ♦is brofcen 

• ^ / " * i * 4 • 

'Hown into polynomial' coefficients (linear, quadratic, etc.). 



, ^ This essentially converts the univariate proceduij^ to a multivariate tjechnique 

and thus no longer requires^ the assumption of". -equal covariances. Bock (1963), 
Cole and GrizzjLe (-1966), and Finn ^(1969) have provided comprehensive discussions 
-on the appl-ic'ation of MANO^A to orepeated mea'gures data," ^nd comparisons^of the 
applications and outcomes of ANOVA versus MANdVA are well /given by Davidson 
(1972), Hummel and Sligo (1971), McCall and Ap^elbaum (1973) L and Poor (1973). ' 

It 'must be pointed out -that not a^l statist icians'^'nor psydhometrieians favor 
multivariate methods. Kempthorne (1966, p. 521) has s^tfj^d* tli'at, "I have yet to 
see any convincing examples of experipyfental data in which afelfe standard techliiques 
of multivariate analysis have led to ^feientific insight." fferhaps' the choice 
between these two types ^of analyses caiu be based on whether the experifeental 
study is priDjarily concerne4 with "information findings" or with ^'decision making^'. 
Univariate pd^c edures may allow for greater (o r easi ^ri interpretation «of s the 
data, and thus, support the -information 'finding apprdach, whereas- multivariate 
techniques (MAIIOVA in particular)^ by providing, an ejcact probability statement, 
are most suitable for decision making^ ' « ^ % 



Table 6 provides for a comparison 6f^the relative l^pwers of multivariate ^ ^-^-^ 
(MV) and univariate (UV) t tests. The symbol MIJV r^ie^s to i repeated pleasures y^. 
univariate AtlOVA which has been modit^ed by^he Gr^enliouse and^GSlsse'r- technique. » 
Notice that for small^ N the MV procedure lacks poxfer in all cases. 'For larg^ f^^' 

(20 more than the number of dependent n^^sur^),and a relatively lajrge rion- ^ ' 
centrality parameter (6), the\ slightly greater power =of the UV ovetr^the. fIV ^ " r'- , , 
procedure is more than compensated for by the^'lower experimentwise error rates 
m the m methods; I9 these situations a MAUOVA wouW seen pjreferable to an ^ 
ANOVA. . ^ * ^ * . 
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COUCLUSION ^ 

There are obviously a consid^table number pi problems^^^jlRent fn the 

measurement and analyses of change, especially in research^ {iesigns of a longi- 
tudinal nature. However, most of these problems can be avoided provided sufficient^^^ 
^:are and planning are taken prior to initiatifag the' research^project. -T-he cross- 
J sectional sequential type designs which are required for vali^d measures of- ^ 



developmental, change are very costly ~ bait necessary if t?he research is t\S have 
^ any scientific value. Multivariate statistical procedures utilizing on complet 
ERJC datalsets will provide for valid and relatively powerful tests of hypotheises< ^ 



Group 1 Si 



Grouii 2 



S2 



S20 

Si. ' 
S2 



Tl 

Xi21 



Trials 



T2 



XU2 



M22 



-1202 



V 

X212 



• • • 



• • • 



X12IO . 



?120l6 

X2110 



^2^20 10 



?ig» !♦ Schemata of 2 x^lO Factorial Experlufent with 
• Repeated Measures on the Second Factor'* 



TABLE 1 - 

. - % 

SEQUENTIAL RESEARCH' DESIGN GIVING AGES OF COHORr GROUPS^ 
. ' ' \ At each TESTING TIME - / 



Cohort 






Time of 


Measurement 








--i930 


1940 


1950, r 


1960 


1970 


19^0- 


•1990 


" 1930 


.5 




25 


35 








1940 




'5 


15 


25 


35 






1950 






5 


- 15 ' 


25 • 


35 . . 




1960 








5 


.15 


• 25 


35 



TABLE 2 



ANALYSIS OF VARIANCE FOR A p X q BIFACTOR 
• DEVELO»JEl<rrAL DESIGN, 



_ Source of Variation 
Among Cohorts (C) 



Ss wlthlh Cohorts (SvC). 



Age (A) . ^ 

Cohort X X'ge (CA) 
SwC6hort x' Age (^CA) 



df 



P-1 



P(n-l)' • 

(PrL)(q-IX 
p(n-l)(q-l)' 



Mean Square 



Ms; 



MSswC 



msa ■ 

MSCA i 
MSswCA 



MScMSswc 



MSa/MSs„ca 
MSga/MSswCA 



\ 



25 



, ' . . TABLETS' 

EXPERIMEOTAL WOUT FOR A BIFACTOR*DEVELOPMEHTAL 
DESIGN WITH CONTROL GROUPS FOR 'TESTING EFFECTS 







f ^ 




1 


. dohort .^""^-^j.^ 


' * 


Age at Time of Testing 






5 




.25 ; 


35 


1930 (Si_2o)' 




J 


. J 

* V 




(S2I-40) 


•* y 


X 

*• 




x 


, .<S/rl-60) 


X 






X 


- ' - • (S61-80) 


X 


X 




- X 


' <S81-100) ■ ' 


x„ 


\ 

A 


V 
A 




1940 (Si«20) 


J 




r 


</ 


• 

(S81«100)' 




if 

V • X 


X 


/ 

y 


• 










i960 (Si_20*) 


y. 








■J 










• 










•<S81-100) * 


X 


r N X •/ 


^ ;"x 


v/ 




vs 


* 







X - denotes no testing at this time* ^* 
>/ denotes tasting done at, this time. 
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TABLE^ 4 



'ANALYSIS OF VARlASCE FOR A ^BIFACTOR DEVELOPMEi^TAL 
DESIGN WITH CONTROL GRQUPS FOR TESTING EFFECTS 



Source .of VariatTon • df ^ * . 

Practice Effects (P) \ 1 

Cohorts with Pi (CwPi) . y ^ 3 . 
/ Subjects with CwPi (Sw^^Pj^s^ 76 

Cohorts within ?2 (CwP2) - ' 3 

^ / Within Cell in ?2 (Error P2) ' 304 ' 

Error for P (Error, P) ' 380 

r ' . ' 

Age (A) * - 

A.x P . /~ • • 3 

/a X CwPi \ . . ' ^ 9\ 

A X CwP2 » - 9 

SwCwPi X A ^ ' 228 

Polled Error 608 



Total • . ^ ' 639 



- error term for CwPi 

-^rror terra for CyP2 

- pooled Error P2fand SwCwP^ 



- error for A x CwPi 

error P + SwCwP^ x>A . 
- error term foi/A,^ A x ^, 
A X CwP2 • " . 



^ The degrees of freedom are based on the design given in Table 3, 
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TABLE 5 



'Analysis of Variance, i^ith Trend, for a 2 x 10 Factorial .Experiment ^. 
^ v/ith Repeated ileasur^s ori the Second Factor 



Source 



df 



Mean Square 



F Ratio 



Groups 

Ss withiti Groups 



Trials 

Linear 
* Quadratic 
Residual 
Groups X Trials 

G 3f iLin/^ 
^ G X T 



Quad* 

G ^ %esid. 
Sw(? X Trials - 



1 

38 



1 
1 

7 

1 
1 

7 



342 



MSg 
MSs(G) 



MSx 



MSgt 

MSgTl 

^ MSs(p)T 



iisgMs 



!1St/MSs(g)T 

^ • MSTf;/MSs(G)T 
-.0 MStq/MSs(g)T. 
MStj^/MSs(g)t 
MSgx/MSs«;)t 

.MSgTl/MSs(g)t 
; MSgtq/MSs(g)t 
MSgTj{/MSs(g)T 



I ■ 



Total 



• 399 



ERIC 
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Tabie, .6. 
Relativel Power of Multivariate 



5^Po^ 



and UillvarjLate F Tegts; , 



4 



No. of I 
Trials 



F Test 



Equal 
var.-cov.? 



Power^ 



"n»k+,l 



n«k+2t) 



3 


1.0- 


U.V 


Yes' 


r ' .21 






• .30 








Yes 


1 .07 

1 




• .18 






MUV • 


No * , 


i ,?3 




• .38 




i ' * 


MV 




1 -12 . 






. .28 


3 


1 2;o' 




Yes ' 


.66 






.86' 






' MUV 


Yes 


' .34 






. .75 






i MUV 


^ No _ 


,64 






.91 


• 


1 « , ' 

i 
1 


MV 


- - • 


.30 






,83 


6 


i 

! -1.0 


- .UV 


Y^Sr^ 


• 

\39 






.45 






.MUV 


Yes * ^ 


.03 


\ 


.07 


J 


i * 


MUV 


No ' ^ 


• '.54 






.65 




i 


MV 


♦ 


.11 








6? 


2.0 ' 


uv 


Yes 


..97 




.99 


\ 




MUV 


Yes 


-.,46 






.76 






MUV - 


No 


.98 






.99 


t 




MV , 




.26 , 






.. %93 




1 

























^ For a specific case of non-uniform varlance-covariance matrix 
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